Successor Feature Sets: Generalizing Successor Representations Across Policies

نویسندگان

چکیده

Successor-style representations have many advantages for reinforcement learning: example, they can help an agent generalize from past experience to new goals, and been proposed as explanations of behavioral neural data human animal learners. They also form a natural bridge between model-based model-free RL methods: like the former make predictions about future experiences, latter allow efficient prediction total discounted rewards. However, successor-style are not optimized across policies: typically, we maintain limited-length list policies, share information among them by representation learning or GPI. typically no provision gathering reasoning latent variables. To address these limitations, bring together ideas predictive state representations, belief space value iteration, successor features, convex analysis: develop new, general representation, with Bellman equation that connects multiple sources within this including different states, reward functions. The is highly expressive: it lets us efficiently read off optimal policy function, imitates demonstration. For paper, focus on exact computation in small, known environments, since even restricted setting offers plenty interesting questions. Our implementation does scale large, unknown environments --- nor would expect to, generalizes POMDP which difficult scale. believe work will extend our approximate environments. We conduct experiments explore potential barriers scaling most pressing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reflections on Successor Liability

متن کامل

Deep Successor Reinforcement Learning

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components – a reward predictor and a successor map. The successor map represents the expected future state occupancy from any g...

متن کامل

Minimal Indices for Successor Search

We give a new successor data structure which improves upon the index size of the Pǎtraşcu-Thorup data structures, reducing the index size from O(nw) bits to O(n logw) bits, with optimal probe complexity. Alternatively, our new data structure can be viewed as matching the space complexity of the (probe-suboptimal) z-fast trie of Belazzougui et al. Thus, we get the best of both approaches with re...

متن کامل

Successor-Invariance in the Finite

A first-order sentence θ of vocabulary σ ∪ {S} is successor-invariant in the finite if for every finite σ-structure M and successor relations S1 and S2 on M, (M, S1) |= θ ⇐⇒ (M, S2) |= θ. In this paper I give an example of a non-first-order definable class of finite structures which is, however, defined by a successor-invariant first-order sentence. This strengthens a corresponding result for o...

متن کامل

Separation and the Successor Relation

We investigate two problems for a class C of regular word languages. The C-membership problem asks for an algorithm to decide whether an input language belongs to C. The C-separation problem asks for an algorithm that, given as input two regular languages, decides whether there exists a third language in C containing the first language, while being disjoint from the second. These problems are c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i13.17399